First, colocalize the Darwin data with the HOT data.
## Set a location for the plots
outputdir = file.path("~", "depthplots")
dir.create(outputdir, recursive = TRUE)
## Colocalize the data
initialize_cmap(cmap_key = "5e05c500-d68d-11e9-9d3b-4f83fcec4710")
## $base_url
## [1] "https://simonscmap.com"
##
## $route
## [1] "/api/data/sp?"
##
## $api_key
## [1] "Api-Key 5e05c500-d68d-11e9-9d3b-4f83fcec4710"
df = get_metadata('tblHOT_Bottle', 'HPLC_chl3_bottle_hot')
## dat = compile(sourceTable='tblHOT_Bottle',
## sourceVar='HPLC_chla_bottle_hot',
## targetTables="tblDarwin_Ecosystem",
## targetVars="CHL",
## dt1="2013-03-03",
## dt2="2015-04-03",
## ## lat1=21.847,
## ## lat2=22.75,
## ## lon1=-158.363,
## ## lon2=-157.9,
## lat1 = df$Lat_Min,
## lat2 = df$Lat_Max,
## lon1 = df$Lon_Min,
## lon2 = df$Lon_Max,
## ## depth1=0.3,
## ## depth2=4813.8,
## depth1 = df$Depth_Min,
## depth2 = df$Depth_Max,
## temporalTolerance=c(3),
## latTolerance=c(1),
## lonTolerance=c(1),
## depthTolerance=c(5))
## saveRDS(dat, file= file.path(outputdir, "dat.RDS"))
dat = readRDS(file= file.path(outputdir, "dat.RDS"))
## See a summary of this data.
dat %>% summary() %>% print()
## time lat lon depth HPLC_chla_bottle_hot HPLC_chla_bottle_hot_std
## Min. :2013-03-06 00:00:00 Min. :21.85 Min. :-158.4 Min. : 1.7 Min. : 8.0 Min. :0
## 1st Qu.:2013-10-01 00:00:00 1st Qu.:22.75 1st Qu.:-158.0 1st Qu.: 74.4 1st Qu.: 93.5 1st Qu.:0
## Median :2014-02-14 00:00:00 Median :22.75 Median :-158.0 Median : 150.2 Median :146.0 Median :0
## Mean :2014-02-28 15:20:45 Mean :22.68 Mean :-158.0 Mean : 558.8 Mean :162.3 Mean :0
## 3rd Qu.:2014-09-16 00:00:00 3rd Qu.:22.75 3rd Qu.:-158.0 3rd Qu.: 549.9 3rd Qu.:215.5 3rd Qu.:0
## Max. :2015-03-30 00:00:00 Max. :22.75 Max. :-157.9 Max. :4810.5 Max. :535.0 Max. :0
## NA's :5735 NA's :5743
## CHL CHL_std
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0193 1st Qu.:0.0029
## Median :0.0284 Median :0.0064
## Mean :0.0497 Mean :0.0122
## 3rd Qu.:0.0711 3rd Qu.:0.0186
## Max. :0.2122 Max. :0.0772
## NA's :2322 NA's :2322
dat %>% dim() %>% print()
## [1] 5982 8
Now, make plots of CHL from the two data sources, over time:
## Make plots for each
for(ii in 1:88){
onetime = dat$time %>% unique() %>% .[ii]
onedat = dat %>% select(lat, lon, time, CHL, depth, HPLC_chla_bottle_hot)
## Also isolate to a single latitude
onedat = onedat %>% filter(time == onetime)
onedat = onedat %>% arrange(depth)
if(nrow(onedat) == 0) next
## Make three plots
## png(file = file.path(outputdir, paste0("depthplot-", ii, ".png")),
## width = 1200, height = 500)
par(mfrow=c(1,3), cex=1.2, oma = c(1,1,4,1))
if(any(!is.na(onedat$CHL))){
plot(y = onedat$CHL, x = onedat$depth,
type='p', cex = 1.5, log="x", pch = 16, xlab = "Depth", ylab = "Darwin CHL", col='grey50')
mtext(onetime, side = 3, outer=TRUE, cex=2)
} else {
plot(NA, ylim=c(0,1))
}
title(main="Darwin Chl-A vs depth")
if(any(!is.na(onedat$HPLC_chla_bottle_hot))){
plot(y = onedat$HPLC_chla_bottle_hot,
x = onedat$depth,
type='p', cex=1.5, pch=16, log='x', xlab = "Depth", ylab = "HOT CHLA", col='grey50')
title(main="HOT Chl-A vs depth")
plot(y=onedat$CHL, x=onedat$HPLC_chla_bottle_hot, pch=16, ylab = "Darwin CHL", xlab = "HOT CHLA", col='grey50', cex=1.5)
title(main="Scatterplot between two sources")
} else {
plot(NA, ylim=c(0,1))
title(main="HOT Chl-A vs depth")
plot(NA, ylim=c(0,1))
title(main="Scatterplot between two sources")
}
## graphics.off()
}
The Darwin data is here: https://simonscmap.com/catalog/datasets/Darwin_Ecosystem
The HOT data is here: https://simonscmap.com/catalog/datasets/HOT_Bottle_ALOHA
Also a different data that also contains CHL-A data, but from a different source. https://simonscmap.com/catalog/datasets/HOT_PP
Chris Follet strongly recommends CTD data:
“My first instinct is to try and use depth profiles from Station ALOHA, focusing on the fluorescence derived Chlorophyll from the CTD drops. There is also some very high resolution ‘Wire Walker’ Data for a handful of cruises. I think that both of these data types are in CMAP. My recollection is that Aditya built an R package to deal with CMAP so this is probably easy to get/use? If this is so, there is a very clean way to start using the DARWIN model CHL which is also in CMAP https://simonscmap.com/catalog/datasets/Darwin_Ecosystem at 3 day depth resolution. (I added Stephanie to the chain in case she has thoughts here).”
“I would recommend also using the CTD data (realtime sensor dropped down the wire: chl from fluorescence, but there is also Salinity, Temperature, densitiy, and nitrate which can be compared directly to DARWIN) because it is of much higher depth resolution. I think any precision lost using fluorescence is more than gained back in the shape resolution. I am very sure this data is also in CMAP, but to make sure you actually receive an email reply from me the data website from HOT is https://hahana.soest.hawaii.edu/hot/hot-dogs/cextraction.html which should help in finding the correct table in CMAP.”
I think this is the right CTD dataset: https://simonscmap.com/catalog/datasets/HOT_CTD
A preliminary look at the data seems to suggest that OMD is perfect in explaining the difference. A scatterplot of Chlorophyll by depth seems to show some mass shifts along depth axis that isn’t explained well by:
The next steps are to (1) colocalize + download CTD data, and (2) visual comparisons (3) then, apply OMD.